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pLn . An important problem in statistics is the construction of confidence regions 

' for unknown parameters. In most cases, asymptotic distribution theory is used to 

construct confidence regions, so any coverage probabihty claims only hold approxi- 
mately, for large samples. This paper describes a new approach, using random sets, 
' which allows users to construct exact confidence regions without appeal to asymp- 

(-H ' totic theory. In particular, if the user-specified random set satisfies a certain validity 

property, confidence regions obtained by thresholding the induced data-dependent 
plausibility function are shown to have the desired coverage probability. 

Keywords and phrases: Coverage probability; inferential model; plausibility 
function; predictive random set; validity. 

^ ; A MS subject classification: 62F25; 60D05; 62E15. 

(N 
O . 

<N 1 Introduction 

(N 
O 
m 



A fundamental problem in statistics is that of constructing confidence regions. Roughly 
speaking, a confidence region is a data-dependent subset of the parameter space with the 
interpretation that, all values inside this subset are "reasonable" estimates of the unknown 
parameter. The more precise interpretation of confidence regions is based a frequentist 



■ notion of coverage probability. That is, in repeated sampling, the confidence region will 



contain the true parameter value a specified proportion of the time. That the confidence 
region (nearly) hits the target coverage probability is crucial to the validity of the resulting 
inference: on one hand, if the actual coverage probability is too high, then the confidence 
regions are likely too large to provide any meaningful notion of uncertainty; on the other 
hand, if the actual coverage probability is too low, then it is likely that the confidence 
region has a systematic bias, casting doubt on the accuracy of the results for the data at 
hand. Unfortunately, it is rare that a simple and exact confidence region is available; the 
well-known Student-t confidence interval for a Gaussian mean is one exception. Typically, 
an appeal to asymptotic theory is made, and confidence regions are built based on the 
simpler limiting distribution; confidence regions based on the asymptotic normality of 
maximum likelihood estimators is one example. However, with this approach, one must 
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add to any coverage probability claim the caveat "fo r sufficiently large sample size." 
Alternatively, numerical methods, such as bootstrap flEfron and Tibshiranil Il993h . are 
popular when a direct appeal to asymptotic theory is questionable. But validity of the 
bootstrap also depends on large-sample theory, so there are no non-asymptotic coverage 
probability guarantees for bootstrap confidence regions. 

This paper describes a new approach, resting on a theory of random sets. The initial 
step is to establish an association between the observable data, the unknown parameter, 
and a mostly arbitrary auxiliary variable. By "association" here we mean a suitable 
representation of the statistical model for the observable data. Alternatively, the as- 
sociation can be viewed as a sort of compatibility relation among the various inputs. 
Random sets supported in the auxiliary variable space — called predictive random sets — 
are propagated, via observed data and the specified statistical model, to random sets in 
the parameter space. These random sets in the parameter space are characterized by 
their belief functions or, alternatively, by their plau sibility function s . These fun ctions 
also appear in the famous Dempster-Shafer theory (jPempsterl l2008l : IShaferl Il976l ). but 
the approach described here is different. It is shown in Section H] that, under very mild 
conditions on the user-specified predictive random sets, exact confidence regions can be 
constructed via a suitable thresholding of the plausibility function. 

The remainder of the paper is organized as follows. Section [2] describes the general 
statistical problem and defines confidence regions and coverage probability. Random sets 
are described in Section 121 with a general overview in Section [XT] and a presentation of the 
important new concept, namely, predictive random sets, in Section I3l2l These sets are the 
driving force behind the proposed approach. In Section H] we first define the plausibility 
function, which is nothing but a probability calculation relative to the distribution of the 
predictive random set, along with the corresponding plausibility region. Then we prove 
the main result that, under mild conditions on the model itself, if the predictive random 
set is valid, a property that is easily satisfied, then the corresponding plausibility regions 
hit the desired coverage probability. This is a finite sample result, not asymptotic. Here 
we find that certain aspects of the formal mathematical theory of random sets leads to 
a relatively simple statement of the sufficient conditions for this result. Two illustrative 
examples, involving models used in reliability theory, are presented in Section El Finally, 
Section El contains a brief discussion. 



2 Setup and notation 

Let Y be an observable sample, taking values in the sample space Y, with distribution 
Py|6) depending on a parameter ^ in a parameter space 0. Here Y may be a vector of 
n (possibly independent) observations, so that Y is actually a product space, but it is 
not necessary to be so specific here. The distribution Py|g is called the sampling model, 
and if the value of 9 were known, then Y could be simulated. In the present context, the 
actual 9 is unknown, and the goal is to use data Y to make inference about 9. 

In statistical applications, it is typical to summarize data Y with a statistic T = T{Y). 
Just like y, the statistic T has a sampling distribution, denoted by Pt|0, which usually 
depends on 9. In fact, one usually takes T to be a minimal sufficient statistic for 9, 
though deviation from this guiding principle is sometimes warranted; see Section [HTTl The 
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initial reduction of X to a minimal sufficient statis tic T can be iust i fied b y the standard 



arguments of Fisher or, more generally, by those of iMartin and Liul (l2012l ). The classical 
frequentist approach to statistical inference derives procedures, such as hypothesis tests, 
based on the sampling distribution of T. In this paper, focus is on confidence regions. Let 
'^Q,(T) be a T-dependent subset of 0. For giyen a G (0, 1), ^a{T) is called a 100(1 — a)% 
confidence region for 6 if 

PT\e{K{T)3e}>l-a, \/ 6 e Q. (1) 

The left-hand side of ([1]) is the coverage probability of ^q,(T), and the definition of 
confidence region places a condition on this coverage probability, namely, that it must 
exceed the 1 — a level. In words, ([I]) states that, if the confidence region 'tfaiT) is 
used in many examples involving data Y ~ Pyie and statistic T = T{Y), then roughly 
100(1 — of the realized regions will contain the true parameter value. In other words, 
if a is small, i.e., a = 0.05, then {^q,(T) ^ 6'} is a rare event with respect to the sampling 
distribution of T. So, in practice, users will use this "rare event" interpretation to justify 
the conclusion that their calculated confidence region contains the true 6 value. 

Clearly, it is most efficient for the 100(1 — a)% confidence region to have coverage 
probability equal 1 — a; this would indicate that, in some sense, its size is just right. In 
practice, however, for the sake of analytical or computational convenience, this efficiency 
is sacrificed. That is, confidence regions used in practice may not exactly satisfy ([T]). 
Equality may hold in ([T]) only as n — )■ oo, and for finite n, the true coverage probability 
may be above or below the desired 1 — a level. It would desirable to have a general way 
to construct regions '^a(T) that satisfy ([T]) for all n, especially if equality can be attained 
in some cases. The objective of this note is to present and justify such a construction. 

Towards this, we must ffist digress a bit to introduce an alternative representation of 
the sampling model Pt|9, one that involves an auxiliary variable. Let U be an (arbitrary) 
auxiliary variable space, equipped with a probability measure Pu- Then choose a function 
a : U X 6 — )■ 6, such that, if f/ ~ Pu, then a(f/, 9) ~ PT\e- In other words, the sampling 
distribution of T can be characterized by the following recipe: 

sample t/ ~ P[/ and set T = a{U,9). (2) 

This is a familiar notion in the context of simulation, e.g., the inverse probability trans- 
form, etc, but here the motivation is different. The function a forges an association 
between data T, parameter 6, and auxiliary variable U. The point is that, once T = t 
is observed, the very best possible inference about 6 is obtained if and only if the cor- 
responding U value is observed. As U is, by construction, unobservable, the inference 
problem can be recast into one of accurately guessing or predicting the unobserved U. 
This is where random sets will come in handy. 

3 Random sets 

3.1 A general overview 

Let U be a space and S a random set, taking values in a collection of subsets of U, with 
distribution Pg. There is an rigorous theory for random sets, presented beautifully in. 
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e.g., iMolchanovi ( 120051 ). Our case here turns out to be a relatively simple special case of 
the general theory so, for the sake of simplicity, we shall ignore the various topological 
and measure-theoretical technicalities that appear in more formal treatments. 

There are a variety of related ways t o describe the dist ribution of the random set. One 
approach is via the capacity functional (jMolchanovll2005l Def. 1.4). Another description, 
popular in applications involving uncertainty (e.g., statistics, artificial intelligence, etc), 
is the belief function, Ps{S C K}, K C U. One can easily see that the belief function is 
a formal analogue to the distribution function of a random variable. One key difference 
when dealing with random sets, compared to random variables, is that the complemen- 
tation law generally fails, i.e., Ps{S ^ K} + Ps{S C K^} < 1, with equality for all K if 
and only if iS is a singleton set with P^-probability 1. One can discuss belief functions 



without explicitly talking about random sets (e.g., IShaferl Il979f ). though we shall not 
do so here. One particularly natural way that belief functions can emerge is through a 
sort of push- fo r ward o peration on a p robability measure via a set- valued mapping; see, 
e.g., iDempsterl fll967l ). iNguyenl f ll978l ). and the discussion following the proof of Propo- 
sition [T] below. There is now a wide varie ty theoretical d e velop ments and applications of 



belief functions; see the volume edited bv I Yager and Lid ( l2008l ). In the remainder of this 
section, we shall focus only on those details that will be important in the sequel. 

Consider now the special case where the random set is nested. In other words, the 
collection S C 2*^ of possible realizations of 5, called the support of 5, satisfies: 



for any S, S' G S, either S* C S" or S* 3 S' . 



(3) 



In th i s case, the belief functiori corresponding to S is called consonant f Aregui and Denoeuxl 



2008; 


" — "--1 
Balch 


2012; 


Shafer 


1976, 


1987) 



tinuity properties of the probability P5, that the belief function is conden sable. These 
together imply that the belief function (for S) is fully characterized (see 
Sec. V.G) by the contour function, given by 



Shafer 1987 



fs{u) = Ps{S 3u}, ue U, 



(4) 



i.e., the probability that the random set S catches the fixed point n G U. As this is 
an ordinary function, not a set function, it will be easier to work with than the belief 
function. That this ordinary function captures the entire belief function can be seen by 
the formula 

Ps{S C if} = 1 - sup fsiu). 

As we shall see in the next subsection, nested random sets, together with their corre- 
sponding contour functions (jl]) play an important role in this new theory. 



3.2 Predictive random sets 



In their inves t igatio ns int o the use of De r npste r-Shafer theory for statistical inference, 
Martin et al. ( 2010l ) and Zhang and Liu (2011) observe that the corresponding belief 



functions have proper calibration properties only for certain classes of assertions or hy- 
potheses. To rectify this mis-calibration, the previous authors argue that the Dempster- 
Shafer focal elements need to be enlarged, and that this can be accomplished by using 
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what are called predictive random sets. This combination of predictive random sets with 
the Dempster-Shafer theory of belief functions provides the mathematical backbone of 
a n ew approach, the so-call ed inferential model (IM) approach; for the complete details, 



see 



Martin and Lid (j2013al ). Here and in Section HJ we shall review this general theory 
with an emphasis on the construction of exact confidence regions. 

Given the auxiliary space U, equipped with measure Pu, let S be a collection of Pu- 
measurable subsets of U. The collection S will serve as the support for the predictive 
random set; without loss of generality, we shall assume that S contains both and 
U. Write S for the predictive random set, for its distribution, and /^(m) for the 
corresponding contour function (jlj). An apparently new concept in the random set theory 
is that of validity. That is, the predictive random set S is valid if fsiU) is stochastically 
no smaller than Unif(0, 1) when U ~ Pu- It will be shown in Section H] that validity of 
the predictive random set leads to confidence regions with exact coverage probabilities. 

Here, the interesting question is how to construct a predictive random set that satisfies 
this validity criterion. The answer is surprisingly simple. First, take S to be nested, so 
that its support S satisfies ([3]) and the belief function is consonant. As discussed in 
the previous subsection, this implies that the contour function fs fully characterizes the 
distribution P5. Now, since validity implicitly requires some connection between P5 and 
Pu, our second condition should forge this connection. Indeed, we shall consider S with 
contour functions fs that satisfy 

fsiu)= ini Puis), ueV. (5) 

We can now prove that these two conditions are sufficient for validity. 

Proposition 1. If S is nested and its contour function satisfies ([5]) , then it is valid. 

Proof Pick any a G (0, 1) and set Sa = [J{S G S : Pu{S) < a}, the largest S G S with 
Pi7-probability no more than a. Based on ([5]), we can easily see that fs{u) < a if and 
only if u G Sa- Therefore, Pu{fs{U) < a} = Pu{Sa) < a- This holds for all a, so fsiU) 
is stochastically no smaller than Unif(0, 1), and the claimed validity holds. □ 

Nested predictive random sets are simple to construct. For example, suppose Pu is a 
Unif(0, 1) distribution and define a predictive random set S given by 

5 = {n : |n-0.5| < |f/-0.5|}, with U^Pu- (6) 

Then the support E> oi S contains all symmetric intervals S centered at 0.5 of width less 
than or equal to 1, which is clearly a nested collection. Next, consider the distribution 
Ps inherited from Pjy, i.e., 

Ps{S <^K} = Pu{{u ■- \u - 0.5| <\U - 0.5|} C K] . 

With a little effort, the reader can easily convince his/herself that Ps satisfies There- 
fore, S satisfies the conditions of Proposition [T] and, hence, is valid. In fact, validity 
of S can be shown directly by ch ecking that the c ontour function fs{-) in Q satisfies 



fsiU) ~ Unif(0, 1) for U ~ Pu- Martin and LhJ fl2ni3af l refer to ^ as the "default" 
predictive random set. 
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The previous arguments can be generalized. For example, let Pu be a general non- 
atomic distribution on U and take /i to be a continuous nowhere constant function from 
U to M. Then it follows similarly that the predictive random set S, defined by 

S = {u:h{u) <h{U)}, with t/ ~ Pc/, (7) 

is admissible and, hence, valid. In many scalar parameter problems, by performing suit- 
able auxiliary variable transformations, one can get U = (0,1) and Pu = Unif(0,l), 
so that the default predictive random set (E]) can be used. However, this more general 
construction of a valid predictive random set proves useful in cases where the auxiliary 
variable space U is of dimension two or more. 



4 Plausibility regions 

4.1 Plausibility functions for statistical inference 

Recall the auxiliary variable representation of the sampling model, i.e., T = a{U,9), 
where T is the statistic of interest, and U ~ Pu. Let T = t he the observed statistic. If 
the auxiliary variable U were also observed, say U = u, then the best possible inference 
on 6 could be obtained, and would be represented by the set 

e^{u) = {e:t = a{u,e)}. 

This set could be a singleton, but need not be. The idea is that if the auxiliary variable 
were observed, then given T = t, one can solve for the parameter of interest, and Qtiu) 
is exactl y this set of s olutions. In other words, Qt{u) defines a t-dependent compatibihty 



relation ( IShaferl 119871 ) on 6 and U. 

Since the auxiliary variable U is not observable, it is not clear exactly how we should 
make use of the sets Qtiu). In the classical Dempster-Shafer context, a belief function 
is defined on G by pushing the measure Pu on U forward through the t-dependent set- 
valued mapping 0j(-), creating a new random set Qt{U), with U ~ Pu- But as we 
indicated above in Section 13. 2[ the Dempster-Shafer belief functions, generally, are not 
properly calibrated, and here is where the predictive random set S comes into play. The 
validity property for S ensures that it will hit its target — a draw from Pu — with large P5- 
probability. Therefore, we may push the measure P5, or its corresponding belief function 
forward, via the map Qt{-), to obtain the bigger random set 



9^(5) = U Qtiu) 



ues 

The intuition is that we expect 6t(5) to contain the true 6 with large P^-probability. So, 
we understand 6t(iS) as a (random) set of "reasonable" guesses of 6: for a given A C G, 
if Qt{<S) n A 0, then we cannot rule out the possibility that the true 9 resides in A. 
By the plausibility of A we mean the P^-probability that Gt(iS) (1 A ^ 0, 

p\M) = PsiQtiS) nA^0}, Ace. (9) 

We shall refer to plt(A) as the plausibility function at A; though the notation does not 
reflect this, the reader should keep in mind that pl^ depends on Pg. 
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A few remarks about the above construction are in order. First, we could have 
equivalently started by defining a behef function belt(A) = P5{0t(5) C A} for the 
random set Qt{<S) and then the plausibihty function plj(v4) = 1 — he\t{A'^). We will 
not need the belief function in what follows, so the presentation here is more direct. 
Second, the fact that the new random set in ([HD is generally larger than that of the 
classical Dempster-Shafer analysis leads to smaller belief functions. It is this squashing 
of the Dempster-Shafer belief function or, equivalently, the boosting of the plausibility 
function, that accounts for the improved calibration. Indeed, as we show below, if the 
predictive random set is vahd, then the squashing/boosting will be just enough to attain 
the desired calibration. Third, the argument here for co mbining Qf.(-) with S as in 



is just a special case of Dempster's rule of combination ( lDempsterlll967l . l2008l ). though 
writing out the details formally perhaps does not provide any additional insight. The key 
point is that Dempster's argument does not require that uncertainty on the U-space be 
summarized with a genuine probability measure. In particular, the same line of reasoning 
applies if uncertainty on U is described via a belief function, like in our present case. 

If P5{6t(iS) = 0} > 0, then one must adjust the formula (jH]) by conditioning on the 
event that St(iS) ^ 0. To avoid such conditioning here, we assume that 

et{u) for all (t, m) pairs. (10) 

This assumption essentially boils down to there being no non-trivial constraints on the 
parameter 6 in the sampling model Pr|0- An example of a non-trivial constraint is in a 
Poisson problem where the mean 6 has a positive lower bound. Most regular problems, 
including the examples in Section O satisfy (ITU]) , though there are some that do not. 
Assumption fllUI) is not necessary to construct plausibility regions with the desired cov- 
erage probabilities, but it will make our presentation easier. The correction requires a 
rel atively technical kind of stre tching of the predictive random sets to maintain validity; 
see 



Ermini Leaf and Lid (120121 ) 



For the important special case where A = {6} is a singleton, we write p\t{0) = 
P'i({^}) — Ps{^t{S) 3 0}. Note that this special plausibility function is just the contour 
function (jlj) corresponding to the new nested random set 0t(iS). This plausibility function 
also gives rise to the 100(1 — a) % plausibility region: 

^„(t) = {e : pl,(^^) > a}. (11) 

As we demonstrate below, if S is valid, then the plausibility region ^a{T) has coverage 
probability at least 1 — a and, in many cases, equality is attained. 



4.2 Coverage probability results 

The first result gives shows that ply(^), for T ~ PT\e, is stochastically no smaller than 
Unif(0, 1) under mild conditions. From this, plausibility region's advertised attainment 
of the nominal coverage coverage probability follows easily. 

Proposition 2. Fix 6 E Q. If S is valid and ffTOl) holds, then p\rp{6) is stochastically no 
smaller than Unif(0, 1) when T ~ Pxie- 
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Proof. From the alternative description of the samphng model Pt|6» in (|2]), for T ~ Pt|6», 
there exists a corresponding Ut ~ Pu such that T = a{6, Ut)- Moreover, it follows easily 
from the definition of 6t(5) that Qt{S) 3 9 ii and only if 5 9 Ut- Therefore, pIt(6') = 
P5{eT(5) 3 6} = Ps{S 3 Ut} = fs{UT)- Since S is vahd, fsiUT) is stochastically no 
smaller than Unif(0, 1), as a function of Ut ~ Pc/, and so the claim follows. □ 

There are two relevant results that can be derived from Proposition [2] and its proof. 

• The first is that, for any a G (0,1), the coverage probability of the plausibility 
region ^^(T) in is at least 1 - a, i.e., PT\e{^aiT) 3 9} > 1 - a ioi all 9. To 
see this, note that PT\e{^a{T) 3 0} = Pt|0{pIt(6') > By Proposition El pIt(6') 
is stochastically no smaller than Unif(0, 1). This implies that the latter probability 
is no smaller than P{Unif(0, 1) > a} = 1 — a, hence the claim. 

• Second, confidence regions can be constructed directly from Ot(-) and the support 
sets S* G S for the predictive random sets. Indeed, for fixed 5* G §, we know from the 
above proof that Qt{S) 3 6 ii and only if Ut G S. So, Pt|40t(5) 3 6} = Pu{S), 
and if we select S such that Pu{S) = 1 — a, then Qt{S) is a 100(1 — a)% confidence 
region for 9. Therefore, an alternative 100(1 — a)% confidence region construction 
selects the smallest S with Pu{.S) = 1 — a and takes ^Q,(t) = 6t(5'). 

An important question is, under what conditions, is the coverage probability exactly 
equal to 1 — a or, equivalently, when is plr(6'), with T ~ Prje, exactly uniformly dis- 
tributed? It turns out that there are two conditions needed. First, T must have a 
continuous distribution PT\ei otherwise pIt(6') cannot be continuous. Second, the predic- 
tive random set must be exact., not just valid, i.e., fs{U) must be exactly Unif(0, 1) for 
f/ ~ P(7. This exactness property holds for the default predictive random set ([6]) and 
its generalized version (^^. Therefore, for problems with continuous T, if we choose an 
exact predictive random set, such as one of those in (P) or ([7]), then the plausibility region 
/!^a(T) has coverage probability exactly 1 — a. 



5 Examples 

5.1 Power law process 

Consider a continuous time non-homogenous Poisson process {Ny : y > 0}, where the 
mean function m{y) = E(N„) satisfies miy) = ipy^, for ip,9 > 0. Such a process is called a 



power law process (e.g.. iGaudoin et al.ll2006l ). The parameter ip is a. scale parameter and 



6* is a shape parameter. Though both ip and 6 are unknown, the goal here is to construct 
a plausibility interval for 6 based on n observed event times Yi < ■ ■ ■ <Yn. 
For these data, the log- likelihood function for [tp, 9) looks like 

n 

9) = n\ogi) + nlogO + {9 - l)^\ogYi - i)Y^. 

1=1 

By the Neyman-Fisher factorization theorem, a joint sufficient statistic for (^/;, 9) is the 
pair (XliLi l^S^i' Yn). This sufficient statistic is a one-to-one transformation of the max- 
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imum likelihood estimator {ip,6), given by 



^ and ip = n/Y^. 



Therefore, 9) is a minimal sufficient statistic for [ip, 9). Moreover, for all if), the vector 
{log(F„/Fn,_i), . . . , log(l^/Yi)} is distributed as a sorted sample of n — 1 indepe ndent 
random variables from an exponential distribution with mean 1/9 (e.g.. ICrowlll974j ). For 
simplicity, we take 

n-l 

T = n/^ = ^log(F„/F,), 

i=l 

which has a gamma distribution with shape n — 1 and scale 1/9. Some information 
about 9 is lost by ignoring the if) component of the joint minimal suffici ent statistic, but 
the req uired marginalization strategy is beyond our present scope; see iMartin and Liu 
( l2013bl ). So, we shall consider here the simple association 

T = F-\,/,{U), t/~Unif(0,l), 

where -F„_i,i/0 denotes the gamma distribution function with shape n — 1 and scale 1/9. 
If the default predictive random set 5 in (E]) is used for t/, then the plausibility function 
turns out to be 

pl,(^) = l-|2F„„i,i/,(t)-l|, ^>0, 

which can be readily evaluated numerically. Then the 100(1 — a)% plausibility interval 
l^^aif) for ^ is given by 

= {9 : pl,(^) >a] = {9: a/2 < < 1 - a/2]. 

Since 1/9 is a scale parameter in the right-hand side above can be rewritten 

as {9 : a/2 < Fn-i^i{9t) < 1 — a/2}. Therefore, if we let 7n_i_i(g) denote the gth quantile 
of the gamma distribution with shape n — 1 and scale 1, then the plausibility interval can 
be written as a genuine interval. 



7n-l,l(f) 7n-l,l(l-f) 



This i s equivalent to the exact confidence interval given in equation (6) of iGaudoin et al. 
(120061 ) in terms of chi-square quantiles. 



5.2 Exponential regression through the origin 

Consider a special case of an exponential log-linear model, where Yi, . . . ,Yn are indepen- 
dent exponential random variables and Yi has mean e^^% i = 1, . . . ,n, for fixed covariates 
xi, . . . , Xn- The goal is to produce a plausibility interval for the slope parameter 9. 
The log-likelihood function for 9 looks like 

n 

1=1 
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and the likelihood equation is given ^"^^(e'°^^'~^^^ — l)xi = 0. Let T be the solution to 
this equation, the maximum likelihood estimator of 6. If Ge = Gg^x,n is the distribution 
function of T, then a suitable association is, again, given by 

T = Gg\U), t/~ Unif(0,l). 

The distribution function Gg is not available in closed form, but it can be evaluated via 
Monte Carlo. If the default predictive random set 5 in is used for U, then again the 
plausibility function for 6 is of the form 

pl,(^) = 1- |2G'e(t) - 1|, eeR. 

No expressions are available for the plausibility function in this case, but, again, it is 
relatively easy to evaluate numerically via Monte Carlo. 

For illustration. Figure [1] displays plots of the plausibility function pl((6'), as a function 
of 6, for two simulated data sets, one of size n = 10, the other of size n = 20. Here the 
covariate values and the true parameter value is 6' = 1. This 

function is evaluated by a Monte Carlo integration step performed at each point 6 on 
the horizontal axis. For comparison, the endpoints of the 95% confidence interval based 
on asymptotic normality of the maximum likelihood estimator are also displayed. In 
both cases, the two intervals are comparable, which is to be expected. However, for 
such small n, it is unlikely that the asymptotic normality has kicked in, so the actual 
coverage probability of the latter is likely different from the target 0.95. The plausibility 
interval, on the other hand, has coverage probability exactly equal to 0.95 based on the 
theory developed in Section IT2l Indeed, in a simulation of 5000 data sets of size n = 10, 
under same setup as above, the estimated coverage probabilities for the exact plausibility 
interval and asymptotic confidence interval are 0.951 and 0.934, respectively. 



6 Discussion 

In this paper, we discuss a new approach for the construction of confidence regions based 
on the theory of random sets. The key result is that if the predictive random set S for 
the unobservable auxiliary variable U is valid, in the sense that it misses its its target not 
too often, then the corresponding plausibility region has at least the nominal coverage 
probability. It is important that this validity result is not asymptotic and, moreover, 
does not depend on any characteristic of the problem that is unknown. Therefore, it is 
generally quite easy to specify a valid predictive random set, and a default choice is given 
here and used in several examples to obtain practically useful results. 

Here the focus was on simplicity rather than generality. Though the two examples 
involved only scalar parameter, essentially the same strategy would apply for a multi- 
parameter problem. A challenging problem in multi-parameter situations is to give an 
exact confidence region for some component or, more generally, some scalar-valued func- 
tion of the full parameter. This was the actual setup in the power-law process example in 
Section 15. 1[ though we sidestepped the main difficulty by ignoring a part of the minimal 
sufficient statistic. To incorporate all the information in the minimal sufficient statistic 
requires some careful manipulations w hich were beyon d the p resent scope. A new and 



detailed look at such problems is given iMartin and Liul (j2013bl ). 
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Figure 1: Plausibility functions plj(^; S) versus 6 in Section [5^ Parentheses on the ^-axis 
mark the endpoints of the 95% confidence interval based on asymptotic normality of the 
maximum likelihood estimator. Horizontal line at a = 0.05 determines the endpoints of 
the 95% plausibility interval. 

The primary goal here was to construct confidence regions that attain the nominal 
coverage probability. We found that, in many cases, including the two examples in 
Section [5l the plausibility regions will actually hit this target on the nose. A natural 
follow-up question is if these plausibility regions are "optimal" in some sense, i.e., do the 
plausibility regions have smallest average size, say, among all those regions that hit the 
desired coverage probability? This question is the focus of ongoing investigations. 
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